Skip to content

Conversation

@mynameborat
Copy link
Contributor

Problem Statement

Few push errors in VPJ are currently not categorized as user errors and impact the SLA calculation for push job failures. Ww need to categorize errors appropriately to improve the accuracy of push job failures as part of SLA measurement.
In this PR we address 3 issues that are mis-categorized

  • Schema validation failure due to key or value schema mismatch between server and input data
  • ACL for store not setup correctly which results in push failures from headless accounts/users not part of store ACLs
  • Incorrect field name for key or value which is not present in input data and results in VeniceSchemaFieldNotFoundException

Solution

The solution is to handle these errors appropriately in VenicePushJob and invoking the correct PushJobCheckpoint for these errors.

Note: We can follow up with PR to move the responsibility of checkpointing to the exceptions themselves. It consolidates the categorization and avoids scattered error handling which is prone to missing scenarios and edge cases.

Code changes

  • Change controller createVersion code path to return new errorType for ACL issues
  • Introduce new errorType ACL_ERROR
  • Introduce new exceptions for store ACL failures and for shema mismatchs

How was this PR tested?

  • New unit tests added.
  • Modified or extended existing tests.

Will test E2E in LinkedIn setup to ensure the vheckpoints are reflected

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

} else if (missingReadAccess) {
errorMessage = String.format(errorMessage, "read");
} else {
errorMessage = String.format(errorMessage, "read and write");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont see any calls with both missingWriteAccess and missingReadAccess being false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes; This is meant to be more of a guard to not have an empty ACL error message.
e.g., I still haven't gotten to the bottom of this error message I noticed

Exception type: class com.linkedin.venice.exceptions.VeniceHttpException. Detailed message: Http Status 403 - Missing [{}] ACLs for user "<redacted>". Please setup ACLs for your store.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who could be calling that?!! anyways, since it is either read or write Acl issues, why not make it a single boolean, true means missing write, false mean missing read?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants